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Abstract. Let M be a bounded domain of R'' with smooth boundary. We relate the Cheeger con- 
stant of M and the conductance of a neighborhood graph defined on a random sample from M. By 
restricting the minimization defining the latter over a particular class of subsets, we obtain consis- 
tency (after normalization) as the sample size increases, and show that any minimizing sequence 
of subsets has a subsequence converging to a Cheeger set of M. 

Index Terms: Cheeger isoperimetric constant of a manifold, conductance of a graph, neighborhood 
graph, spectral clustering, U-processes, empirical processes. 

AMS 2000 Classification: 62G05, 62G20. 

1 Introduction and main results 

The Cheeger isoperimetric constant may be defined for a Euclidean domain as well as for a graph. 
In either case it quantifies how well the set can be bisected or 'cut' into two pieces that are as little 
connected as possible. Motivated by recent developments in spectral clustering and computational 
geometry, we relate the Cheeger constant of a neighborhood graph defined on a sample from a 
domain and the Cheeger constant of the domain itself. 

Given a graph G with weights {6, j}, the normalized cut of a subset 5 c G is defined as 

h(S,G) - 

where S denotes the complement of S in G, and 
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are the discrete volume and perimeter of S . The Cheeger constant or conductance of the graph G 
is defined as the value of the optimal normalized cut over all non-empty subsets of G, i.e. 

H(G) = mm{h{S;G): S cG,S i^d)}. (1.3) 

A corresponding quantity can be defined for a domain of a Euclidean space. Let M be a bounded 
domain (i.e. open, connected subset) of R'' with smooth boundary dM. For an integer I < k < d, 
let Vol^. denote the /:-dimensional volume (Hausdorff measure) in R'^'. For an open subset A c R'', 
define its normalized cut with respect to M by 

h(A; M) = , 

^ ^ minjVoUA n M), VoUA'- n M)} 

where A'^ denotes the complement of A in R'' and with the convention that 0/0 = oo. The Cheeger 
(isoperimetric) constant of M is defined as 

H{M) = M{h{A; M) : A c M}. 

Equivalently, the infimum may be restricted to all open subsets A of M such that dACiM isa smooth 
submanifold of co-dimension 1 . This quantity was introduced by Cheeger [[TSll in order to bound 
the eigengap of the spectrum of the Laplacian on a manifold. A Cheeger set is a subset A <z M 
such that h(A; M) = H{M)\ there is always a Cheeger set and it is unique under some conditions 
on the domain M lfT2]| . For A c M, we call dAf\M its relative boundary. 



1.1 Consistency of the normalized cut 

Suppose that we observe an i.i.d. random sample X„ = (Xi, . . . ,X„) from the uniform distribution 
jj. on M. For r > 0, let G„_,. be the graph with nodes the sample points and edge weights 5ij = 
- XjW < r), which is an instance of a random geometric graph 031 . Let ojd denote the 
J-volume of the unit J-dimensional ball, and define 

7^= r max((i<,z),0)l{||z||< l}dz, (1.4) 

JR'' 

where u is any unit-norm vector of R'^. Actually is the average volume of a spherical cap when 
the height is chosen uniformly at random. We establish the pointwise consistency of the normalized 
cut, which yields an asymptotic upper bound on the Cheeger constant of the neighborhood graph 
based on the Cheeger constant of the manifold. This is the first result we know of that relates these 
two quantities. 

Theorem 1. Let A be a fixed subset of M with smooth relative boundary. Fix a sequence r„ ^ 
with nr^^^ llogn —> +oo, and let S „ = A n G„^,.„- Then with probability one 

—hiS„-Gn,J^h(A;M), 

ydTn 

and, consequently, 

limsup— //(G„,,„) < H{M). 

n—>oo 
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We do not know whether the Cheeger constant of the neighborhood graph, for an appropriate 
choice of the connectivity radius and properly normalized, converges to the Cheeger constant of 
the domain. 

1.2 Consistent estimation of the Cheeger constant and Cheeger sets 

We obtain a consistent estimator of the Cheeger constant H(M) by restricting the minimization 
defining the conductance of the neighborhood graph (11.31) to subsets associated with subsets of R'' 
with controlled reach. The reach of a subset S c R"^ [|20l . denoted reach(5), is the supremum over 
77 > such that, for each x within distance 77 of 5* , there is a unique point in S that is closest to 
X. We assume here that M c (0, 1)"^. When this is not known and/or not the case, we may always 
infer a hypercube that contains M — by taking a hypercube containing all the data points, with some 
lee-way so that the hypercube contains M with high probability when the sample gets large — and 
then rescale and translate the points so that M is within the unit hypercube. So this assumption is 
really without loss of generality. 

Theorem 2. Assume that M c (0, 1)'' and that r„ ^ such that nr^'^'^^ — > 00. Let p„ slowly 
so that r„ = o(p") and nr^^^p" — > 00 for all a > 0. Let 'R^ be a class of open subsets R c (0, 1)'^ 
such that xtdiC]\{dR) > pn- Define the functional h\ over'Rn by 

hl(R) = -^h{RnXn;G„,.J 
7drn 

if both R and R^ contain a ball of radius Pn centered at a sample point, and h\{R) = 00 otherwise, 
(i) With probability one, 

min hl(R) H(M), n 00. 

ReK„ 

{ii) Let {Rn) be a sequence satisfying 

Rn e hl(R„) = mm{hl(R) : R e K). (1-5) 

Then with probability one, {R„ n M} admits a subsequence converging in the L^ -metric. 
Moreover, any subsequence of {R,, n M) converging in the L^ -metric converges to a Cheeger 
set of M. 

Note that the infimum defining i?„ in (11.51 ) is attained in since the function h\ takes only a 
finite number of values. 

Part ( ii) of Theorem [2] hints at a consistent estimate of a Cheeger set of M, but i?„ n M depends 
on M, which is unknown. On the other hand reconstructing an unknown set from a random sam- 
ple of it is an independent problem for which there exists multiple techniques and an important 
literature — see e.g., H and the references therein. In the following result we construct a random 
discrete measure which does not require the knowledge of M, and prove that, seen as a sequence 
of random measures indexed by the sample size n, any accumulation point is the uniform measure 
on a Cheeger set of M. 
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Theorem 3. Let {R„] be a sequence as in Theorem^iii), a subsequence of{R„} with n 
M — > Aoo in O. Define the random discrete measure Q„ = ^ 1«„(^')^a:, '^nJ the measure 
Q = 1a„(.)/^- Then, that Q„ converges weakly to Q is an event which holds with probability one. 

As an example of an estimate of a Cheeger set of M, one can consider a union of balls of radius 
Kn centered at the observations falling in 7?„. Under appropriate conditions, it is known that this 
estimate converges in L'; see [6]. 

Let us mention that with our result, only the "regular" part of a Cheeger set can be recon- 
structed. Indeed, in dimension J > 8, the boundary of a Cheeger set is not necessarily regular and 
may contain parts of codimension greater than 1 . 

1.3 Connections to the literature 

Our results relating the respective Cheeger constants of a domain and of a neighborhood graph 
defined from a sample from the domain are the first of their kind, as far as we know. The con- 
nections to the literature stem from the concept of normalized cut taking a central place in graph 
partitioning and related methods in clustering; from a recent trend in computational geometry (and 
topology) aiming at estimating geometrical (and topological) attributes of a set based on a sample; 
and from the fact that we can use the conductance to bound the mixing time of a random walk on 
the neighborhood graph. 

Clustering. In spectral graph partitioning, the goal is to partition a graph G into subgraphs 
based on the eigenvalues and eigenvectors of the Laplacian [|36l[T6l . It arises as a convex relaxation 
of the combinatorial search of finding an optimal bisection in terms of the normalized cut. Given 
a set of points Xi, . . . ,X„ and a dissimilarity measure (or kernel) (p, spectral clustering applies 
spectral graph partitioning to the graph with nodes the data points and edge weight 5ij = <p{Xi, Xj) 
between X, and Xj OTll . For instance, if the points are embedded in a Euclidean space, the kernel 
is often of the form 0(x, j) = i/r(||x - y\\/cr), where cr is a tuning parameter, and i/r is, e.g., the 
Gaussian kernel iff(t) = exp(-f^) or the simple kernel tff{t) = l[o,i](0 Il30l I3l. The consistency of 
spectral methods has been analyzed in this context I0^[321 I^ I2T1|351 . In particular, lEHll proves a 
result similar to our Theorem [T] in that context. 

About cuts, ||27]| also proves a result similar to our Theorem \T\ when the separating surface 
dA is an affine hyperplane. Closer to our Theorem |2l [|29l establishes rates for learning a cut for 
classification purposes — so the setting there is that of supervised learning, with each sample point 
Xi associated with a class label F,. 

Computational geometry (and topology). The Cheeger constant H{M), and Cheeger sets, 
are bona fide geometric characteristics of the domain M that we might want to estimate, follow- 
ing a fast developing line of research around the estimation of some geometric and topological 
characteristics of sets from a sample, e.g., the number of connected components [|5J, the intrinsic 
dimensionality [|26l and, more generally, the homology (311 [IS El SD [HUM [111; the Minkowski 
content [[T71 , as well as the perimeter and area (volume) [[SI. 

Random walks. Random geometric graphs are gaining popularity as models for real-life net- 
works. Some protocols for passing information between nodes amounts to performing a random 
walk and it is important to bound the time it takes for information to spread to the whole network; 
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see [|2l and references therein. It is well-known that, given a graph G, a lower bound on H{G) may 
be used to bound the mixing time the random walk on G. This is the path taken in [|7l O when 
M is the unit hypercube and the graph is G,„ „. However, in both papers the authors reduce the 
setting to that of a regular grid without rigorous justification, leaving the problem unresolved (in 
our opinion) even in this particular case. 

1.4 Discussion 

As we saw, there are only a handful of other papers relating cuts in neighborhood graphs and 
cuts in the corresponding domain from which the points making the neighborhood graph where 
sampled from. Our paper is the first one we know of that establishes a relationship between the 
Cheeger constant (optimal normalized cut) on the neighborhood graph and the Cheeger constant 
of the domain, and the first one to propose a method that is consistent for the estimation of the 
latter based on a restricted normalized cut, and also consistent for the estimation of Cheeger sets. 
Our results generalize with varying amount of effort to other related settings. However, we leave 
important questions behind. 

Generalizations. With some additional work, our results and methodology extend to settings 
where the kernel (here the simple kernel) is fast decaying and where the data points are sampled 
from a probability distribution on M that has a non-vanishing density with respect to the uniform 
distribution. It would also be interesting to consider the setting where M is a J-dimensional smooth 
submanifold embedded in some Euclidean ambient space. Our arguments seem to carry through 
using a set of charts for the manifold M, as is done in [|9l Lem. 3.4]. 

Refinements. Though we focused on sufficient conditions for r„ to enable a consistent estima- 
tion of the Cheeger constant of the domain, it may also be of interest to find necessary conditions. 
Partial work suggests that nfj^ — > oo is necessary, and may be sufficient the divergence to infinity 
is faster than a sufficiently large power of logn. The arguments in support of this, however, are 
substantially different than those we use in the paper, which hinge on Hoeffding's inequality for 
i7-statistics. 

An open problem. Whether the normalized Cheeger constants of some sequence of neigh- 
borhood graphs converges to the Cheeger constant of the domain is an intriguing question. To 
paraphrase the question we leave open, is there a sequence {r„} such that, with probability one, 

lim ^HiGn.J = H{M)1 

A positive answer would establish the consistency of the normalized cut criterion for graph parti- 
tioning. Also, a lower bound on H(G„ , J would provide a lower bound on the eigengap between 
the first and second eigenvalue of the Laplacian, which in turn may be used to bound the mixing 
time of the random walk on G„ ,,,, as done in [jVl when M is the unit hypercube. 

Consistent estimation in polynomial time. Our estimation procedures, though theoretically 
valid and consistent, are not practical. It would be interesting to know whether there is a consistent 
estimator for the Cheeger constant that can be implemented in polynomial-time. Note that com- 
puting the Cheeger constant of a graph is NP-hard (which motivates the use of spectral methods). 
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and even the best polynomial-time approximations we are aware of are not precise enough to allow 
for consistency 



1.5 Content 

The rest of the paper is devoted to the proofs of the three theorems. In Section [2l we establish 
the convergence of the discrete volume and perimeter to their continuous counterparts of a fixed 
subset of M with smooth relative boundary, using Hoeff"ding's inequality for i7-statistics f24^. 
Then, by the lower semi-continuity of the map A h(A; M), we deduce the supremum-limit 
bound of Theorem [U In Section [31 we prove Theorems [2] and [3] by utilizing results on empirical 
i7-processes ^18l on the one hand, and compactness properties of the -metric Il23ll on the other 
hand. 



1.6 Notation and background 

The uniform measure on M is denoted /u, so that yu(A) = Volf/(A PiM)/ Vol^(M); and the normalized 
perimeter is denoted v(A) = Yold-iidA DM)/ Volrf(M). Let tm = Vol,/(M), and define the discrete 
volume and perimeters as 



where S, a are given in (11.21) . Xn is the sample, and G„ ,„ the neighborhood graph. Also, define the 
discrete ratio 

^"(^) 

hn{A) = , 

mm{iXn{A),n„{A')) 

and note that 

/r„(A) = — /r(An^„;G„,,„), 
Jdrn 

where h is given in (11.11 ). For further reference, we define the volume ndirf) of a spherical cap at 
height T] by 

^dijT) = No\d{x : \\x\\ < 1 and {u,x) > 77}, 

where u is any unit-norm vector of R^. Note that the constant defined in (11.41) may be expressed 
as ^ 

7d = I nA7])dT]. 
Jo 

The reach coincides with the condition number introduced in [[3T]| for submanifolds without 
boundary, and the property reach(M) > r is equivalent to A and A'^ being both r-convex |[39l , in 
the sense that a ball of radius r rolls freely inside A and A'^. (We say that a ball of radius r rolls 
freely in A if, for all p e dA, there is x e A such that p e dB(x, r) and B{x, r) c A.) It is well- 
known that the reach bounds the radius of curvature from below [20, Thm. 4.18]. In particular, if 
reach(5A) > 0, then dA is a smooth submanifold (possibly with boundary). 

In the rest of the paper, the generic constant C may vary from line to line, except when stated 
explicitly otherwise. 
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2 Proof of Theorem Consistency of the normahzed cut 



For a subset A of M and a real number r > 0, define the symmetric kernel 

Mx,y) = Uuix) + iA(y)}M\\x - y\\ < r}, (2.1) 



2 

so that yU„(A) may be expressed as the following U-statistic: 



Similarly, v„(A) may be written as 
with the symmetric kernel 



M^^y) = \[^A{x)\Ay) + 1aW1a'(^))i{IU - y\\ < r]. (2.2) 

We shall need the following Hoeffding's Inequality for i7-statistics [!24l, which is a special case 
of IfTSlThm. 4.1.81. 

Theorem 4. Let (f>be a measurable, bounded kernel on R^xR'' and let {Xk : k ^Wjbe i.i.d. random 
vectors in R^. Assume that'&\(p{X\,X2)\ = and that b := \\(f>\\cc < c«, and let cr^ = Var(0(Xi,X2)). 
Then, for all t > Q, 



— 1— y«,X,)>? <exp(- 
n{n - \) ^ \ 5o-^ + ibt] 



To prove Theorem[Tl we establish the almost-sure convergence of /i„(A) to /i(A) and of v„(A) to 
v(A) for a subset A c M with smooth relative boundary. To this aim, we combine upper bounds on 
bias terms together with exponential inequalities for U-statistics. The bias terms involve volume 
bounds which we present next, and integrations over some neighborhoods of the boundary of a 
regular set, namely tubular neighborhoods or simply tubes, which comes after that. 



2.1 Volume bounds 

For any r > 0, define 



Mr = {x&M: dist(x, dM) > r}. (2.3) 



The following two lemmas provide bounds on the volume of the intersection of balls with some 
subsets of M. 
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Lemma 5. Let Rbe a bounded open subset ofW' with reach(5i?) = p > 0. Set A = RCiM. For any 
r < min{reach(5M); p}, any < ?/ < 1, and all p in dA Pi Mr, we have 

Yoh (B(p + wep, r) n A') - nAr]y\ < 2co,.,/^'lp, 

where ep denotes the unit normal vector at p pointing inward A. 

Proof. For ease of notation, set B = B(p + rjrep, r). Let {e\,. . ., Sd) be an orthonormal frame at p, 
with Sd = ep. Denote by xi, . . . ,Xd the local coordinates in this frame, such that p has coordinates 
0. Then dA D M can be expressed locally as the set of points x such that x^ = F(x^, . . ., x^~^) for 
some function F, and, if we set x^''^ = (x^, . . ., x^'^), then 

Yold(B n A') = r 1{/ < F(x<^))}djc 
Jb 

< F(jc(''^)}l{/ < 0} + 1{/ < F(jc^^^)}l{x^ > 0}] dx 



Since 

ndi-nV = \ Myf' <0}dx 
Jb 

it follows that 

|volrf {Bn n A') - nd{ri)ri\ < J > < 0} + < F(x(^))}l{x^ > 0}] dx 

< f < |Fa(^^)|)dx<2 r \F{x''''^)\dx^'^\ 

Jb„ J|||.v<'')||<r| 

Expanding F at 0, we have, for all x with \\x\\ < r, 

d-l 

F(x^'^) = Gijm'xK 

i,j=i 

for some ^ := ^(x^''^). Since the reach bounds the principal curvatures by 1/p GOll . we have 
^^PpedAnMr \\G(P)\\ ^ 1/P- Then, using the change of variable u = rx, we deduce that 

voh(B(p + r]rep,r)nA')-nd(7iy„ <2oJd^i sup \\G{p)\\/^' 

pedAnM 

Lemma 6. There exists some constant C > such that, for all r, a satisfying < 2r < a < 
reach(5M), and all x in M, 

Yo\d(Bix,a)nM,-)>Ca''. 

Proof. The main argument is to include a ball of radius a/4 into B(x,a) D M,-. We can proceed 
the following way. First, because p := reach(5M) > 0, for any x £ M there is y e M such that 
X e B{y,p) c M. Second, since dist(y, dM) > p and p > 2r, we have y e Mr and B(y,p - r) c M,-. 
Hence 

B(x, a) n B(y,p - r) c B(x, a) n M,-. 

If 3^ = jc, the result is trival. Otherwise, let z := jc + (r + a/4)(y - x)/\\y - x\\ and note that B{z, a/4) 
is a ball of radius a/4 included in B(x, a) n B{y,p - r). □ 
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2.2 Integration over tubes 

We introduce the notion of tubes and some of their properties; see Il22]| for an extensive treatment. 
Let 5 be a submanifold of W^. The tubular neighborhood of radius r > about S , denoted 'V{S, r), 
is the set of points x in for which there exists s e S with \\x - 5|| < r and such that the line 
joining x and s is orthogonal to S at s. When S is without boundary, 'V{S, r) coincides with the set 
of points X in R.'' at a distance no more than r from 5 . If 5 has boundary, then the tube coincides 
with the set of points at distance no more than r, with the ends removed, corresponding to the 
points projecting onto dS . Assume S is of codimension 1, and oriented, and define ep as the (unit) 
normal vector of S at p e S . When r < reach(5), "ViS, r) admits the following parameterization 

'V{S, r) = {x = p + tep : p e S,-r < t < r}. 

Denote by Hp the second fundamental form of 5 at p e S . The infinitesimal change of volume 
function is defined on 5 x (-r; r) by t^(/>, t) = det(/ - tUp); the dependence of on 5 is omitted. 
Given an integrable function g on 'ViS, r), we have: 



r gix)dx= f f g(p,m(p,t)dtvAdp), 

J'ViS.r) Js J-r 



where v,j is the Riemannian volume measure on S . 

Lemma 7. Assume S is a submanifold of W' of codimension 1, with p := reach(5) > 0. Then, for 
all r < p, 

sup sup d^{p,t) < (1 + rlpf~\ 

pes -r<t<r 

and 

^ (d-m+r/pY-' 
sup sup \rr (p, t)\ < 

pes -r<t<r P ~ f 

where &' is the derivative of^} with respect to t. 

Proof. By Il20l Thm. 4.18], the reach bounds the radius of curvature from below so that the prin- 
cipal curvatures k^^\. . . ,k^'''^^ (the eigenvalues of the second fundamental form) are everywhere 
bounded (in absolute value) from above by 1 /p. Therefore, for r < p and -r < t < r, 

< &(p, t) = det(/ - tUp) = Y](l- <V) < (1 + r/pY~\ 

i=l 

For the derivative of we have 



&{p 

Hence 



^ = _ V p 
,0 ""i^i-4V 



i^-(.,oi. ^(.,0(^-1)^. 

1 - rjp P - r 
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The celebrated Weyl's tube formula POl provides fine estimates for the volume of a tubular 
region around a smooth submanifold of R'^. We only require a rough upper bound of the right 
order of magnitude, which we state and prove here. 

Lemma 8. For any bounded open subset R c K.'' with reach(5i?) = p > and any < r < p, 

Yoh('V(,dR,r)) < 2^Vol^_i(5i?)r. 
In particular, Lemma [8] implies 

fj. W{dM, r)] < Cr, Mr < reach(5M), (2.4) 
where C is a constant depending only on M. 

Proof. Using the uniform bound of the infinitesimal change of volume given in Lemma|71 we have 
yo\dVV{dR,r)]= r r ^{p,u)Auv^{d.p) 

JdR J-r 

< Yoh-i(dR) 2r(l + r/pY~^ < Yoh-i(dR) r. □ 
2.3 Bounds on bias terms 



Recall the definition of Mr in (1231) . 

Lemma 9. Let (pA,r be defined as in (12.11 ). There exists a constant C, depending only on M, such 
that, for any A c M and r < reach(5M), 



^E[0^,,(Xi,X2)]-MA) 
co^r^ 



< n{A n M';). 



Proof. Assume without loss of generality that = 1. We first note that 

E[0^,,(Xi,X2)] = E[U(Xi)l{||Xi -X2II < r}] . 
We partition A into A Pi M,- and A n M'^. By conditioning on Xi, we have 

E [l^nM,.(^i)l{ll^i - ^2ll < r}] = co^fiiA n Mr) = ojyix{A) - co^fiiA n M^.); 

E [lAnMfXXMlx, - X2\\ < r}] < (j^niA n m;). 

Hence the result. □ 

Lemma 10. Let A = RnM, where R is a bounded domain with smooth boundary and reach(5i?) = 
p > 0. Let ^A,r be defined as in (12.21) . 



(z) There exists a constant C, depending only on M, such that, for any A c M and r < 
min{p/2, reach(5M)}, 



^''-E[0^,,-(Xi,X2)]-v(A) 



< C I Yoh-i{dR n "VidM, r)) + Vol,,_i(57? n M)^ | . 
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(//) There exists a constant C, depending only on M, such that, for any A G M and r < 
min{p/2,reach(5M)}, 



Ji^E[MXuX2)] ^r^-TTTT^ > -Cv(A)-. 



Vol,(M) 



(2.5) 



Proof. Assume without loss of generality that tm = I - Let S denote dR n M. Then 

E [^aA^M] = E [1a(X,)1^.(X2)1 - X2II < r}] = f Vol^ [B{x, r) n A'] ^i{dx), 

Jd 



where 



D = {xeA : dist(x, dR) < r} 



Since r < p, the projection on dR is well-defined on D, and any xinD can be written as x = p + tCp, 
for p e and with ep the unit normal vector of dR at p pointing inwards. 

We partition D into D n and D n M^. Denote by S ,. the projection of D n M,. on 5 . We have 

r Vol,/ {B{x, r) n A'^] djc = f f Vol,/ \B{p + ^e^, r) n A'^l §{p, t)dt vAdp) 

JonMr Js, J-r 



Therefore 



yd 



[ Yoh[B(x,r)nA']dx-yMA) 

JDnM,- 



(2.6) 



<^ J j I Vol,/ [B(p - rjrep, r) n A'] - nMr'^l &(p, rr^dr] v^P) 

^ If 

JSr Jo 



nd{ri)&{p, rj])dj]vAdp) - r,/y(A) 



Lemma [5] provides the inequality Yold^Bip - rirep,r) nA'^^- nd(r])r'' < 2a»,/-i r'^'^Vp, and the 
first inequality of Lemma |7] states that sup^g^ sup_,.<,<, ??(p, < (1 + r/p)^'^. Since r < p, 
^^Ppes s^Po<r/<i ^(Z*' - Hence, the first term on the right-hand side is bounded by 

2tOd~,(r/p) [ [ §(p,rr])dr]vAdp)<2''iOd-i(r/p)Yoh^i(,S,-). 

JSr Jo 

To bound the second term, a Taylor expansion leads to the relation {)-{p, rrj) = \ + &'(p, r^,^)rri 
for some < < I. The second inequality of Lemma |7] states that sup^^^ sup^,.<,<^ |??'(p, 01 ^ 
(d- 1)(1 +r/py'~^ /(p-r) sothatsuppg^ supo<;,<i !??'(/', rf;,)| is bounded by (d- l)2'Vp since r < p. 
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Recall that the constant y,/ is expressed as yd = Jq ^d{n)'^n- Then the second term in the right-hand 
side of (12.61) is bounded by 



// 

JSr Jo 



7Td(.T])driVa-(.dp) - 7dv(A) 



+ r 



f f r]ndiT]Wip,r^,)\dT]vAdp) 
Js, Jo 

< Jd Noh-iiSr) - Volrf_i(S)| + (J - l)2^r^(r/p) Vol,/_i(5,) 

< Trf Vol,/_i(5 n M;) + {d- \)2''yd{rlp)Voh^,{Sr), 



where we have used the fact that 5 \5 ^ c since 5 n M^- c 5 , . Collecting terms, the term in (12.61) 
is bounded by 

Jd Volrf_i(5 n M;) + C- Vol^_i(5,), 

P 

for some constant C independent of M. 

For the integral over D n MJi , since D is included in the intersection of tubes of radius r about 
dR and dM, i.e., Z) c r) n ^(5M, r), we have 

r Void r) n A^] dx < r f Vol^ + tCp, r) n A'^l t)dt v^(dp) 

■ r r VolJ5(;7-?7rep,r)nA'l r77)d77v^(d;7) 

JafiiTVfaM.r) Jo 



ldRn'V{aM,r) 
■)d-\, , „<i+l 



< 2''-'cOdr-''Yoh-i(dRn^(dM,r)), 

where we have used Lemma|7]again to bound |?9^(p, rri)\ by (1 + rjpY'^ < 1^'^ in the last inequality. 
Combining the two inequalities on the integrals over D n M,. and D n M^ , we obtain that 



1 



Jdr 



,,(Xi,X2)]-y(A) 



< Volrf_i(S n M,0 + C- Volrf-i(5,) + 2^^-10;,/ Volrf_i(5i? n ^(5M, r)) 

P 

< C |vol,_i(57? n WM, r)) + Vol,/-i(5)^j , 
which proves the first bound stated in Lemma [TOl 



To prove (//), using the bound on (12. 6|) . we deduce that 

JclTT^ Jdi 



^^^E[^^,,(Xi,X2)] > r Vol,[5(x,r)nA^]dx 



> Vol,_i(5)- 



C r 



Vol,/-i(5 nM,0 + Volrf.i(5,) 



> Vol,,^i(5 n M,-) - C- Vol,^i(5,), 

P 



and since 5^ c 5 , the result follows. 



□ 
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2.4 Exponential inequalities 

Proposition 11. Fix a sequence r„ 0. Let A c M be an arbitrary open subset ofM. There exists 
a constant C depending only on M such that, for any s > 0, and all n large enough, we have 

P[MA)-MA)| >e] <2exp 

In particular, ifnr^[/ log n ^ oo, then fx„{A) converges almost surely to p(A) when n ^ oo. 

Proof. By the triangle inequality, we have 

\lu„{A) - M(A)\ < \iUn(.A) - E \jUn(A)]\ + |E [/z„(A)] - p{A)\ . 

For all n large enough such that r„ < reach(5M), the second term on the right-hand side (the bias 
term) is bounded by Cr„ with C depending only on M. Indeed, Lemma [9] states that the bias is 
lower than ^(A n M'^ ). And the tubular neighborhood of dM of radius r,„ which contains A n , 
has a volume bounded by Cr„ by (12.41) . 

Assume that n is large enough such that 2Cr„ < s. We then apply Theorem |4l which is 
Hoeffding's Inequality for i7-statistics, to the first term (the deviation term) on the right-hand side 
with the kernel 

(f>:=(f>A,r„-^[(pA,r„(XuX2)] 

and t = a)dr^s/2. The kernel satisfies ||0||oo < 1, and simple calculations yields 

Var(0(Xi,X2)) < e[0^,,„(Xi,X2)2] < fx(A)aj,rfjTM < oj^r^jTM- 

From this we obtain the large deviation bound. The almost sure convergence is then a simple 
consequence of the Borel-Cantelli Lemma. □ 

Proposition 12. Fix a sequence r„ — > 0. Let A be an open subset of M with smooth relative 
boundary and positive reach. There exists a constant C depending only on M such that, for any 
e > 0, and for all n large enough, we have 

I nri^'e" \ 

P [|y„(A) - v(A)| > 6] < 2 exp - \ . 

\ C(v(A) + 6)/ 

In particular, if nrf^^^ / log n — > oo, then 

v„(A) v(A), n ^ oo, almost surely. 

Proof. By the triangle inequality, we have 

\vM) - y(A)\ < \vM) - E [v„(A)]| + IE [y„(A)] - v(A)| . 

Using the control on the bias in Lemma[TOl-(?'). the second term on the right-hand side goes to as 
n — > oo. Then for n large enough, we apply Hoeffding's inequality of Theorem H] to the first term 
on the right-hand side with the kernel 

cf,:=4,A.r„-E[^A,r,XXl,X2)] 



nrf^s 
'C(l +e) 



13 



and t := ycir'^''^v{A)el2. The kernel satisfies ||0||co < 1, hence 



Var(<^(Xi,X2)) < e[^a,,„(Xi,X2)'] = E [0a,,„(Xi,X2)] < 2y,y(A)rf Vtm, 



where the last inequality follows from upper bound on the bias of Lemma[TOl-(/) for n large enough. 
From this we obtain the large deviation bound, and the almost sure convergence is a consequence 



2.5 Proof of Theorem [E 

The first statement of Theorem [T] is an immediate consequence of the exponential inequalities of 
Propositions [TT] and [121 

To prove the second statement, under the conditions of Theorem [B for any subset A with 
smooth relative boundary, with probability one lim„ /z„(A) = h{A; M) while hn{A) > -^H(G„^,„), 
so that lim sup„ j^H{Gn,r„) ^ h{A\ M). Then we obtain the upper bound of Theorem [T] by taking 
the infimum over all such subsets A. 

3 Proof of Theorems |2| and S consistent estimation 

Consistent estimation in the context of Theorem [2] is possible because the class is sufficiently 
rich as to include sets that approach Cheeger sets of M and its complexity is controlled, so as to 
allow for a uniform convergence both in terms of discrete volume and discrete perimeter. This con- 
trol on the complexity of %^ we exploit in building a covering for which is done in Section [BTl 
later used to obtain uniform versions of Propositions[lT|and[l2l Then Part (i) of Theorem[2l which 
states the convergence of a penalized graph Cheeger constant towards the Cheeger constant of M, 
is proved in Section [3771 Finally, Part (ii), which characterizes the accumulation points of a se- 
quence of minimizing sets, is proved in Section 13. 8[ The convergence of the discrete measures 
associated with a sequence of minimizing sets (Theorem [3]) is proved in Section [J!9i 

3.1 Covering numbers 

For p > 0, let Kp be the class of open subsets R c (0, 1)'' with reach(5i?) > p. Let dH{R,R') be the 
HausdorfF distance between two sets R and R' , i.e.. 



Denote by N [e, 'Rp, dn) be the covering number of 'Rp for the Hausdorff distance, i.e., the minimal 
number of balls of radius s for the HausdorfF distance, centered at elements in Kp that are needed 
to cover 'Rp. 

Lemma 13. ( i) There exists a constant C depending only on d such that, for any e > and any 



of the Borel-Cantelli Lemma. 



□ 



dniR, R') = inf {r > : R c R' ® B(r) and R' cR® B(r)} . 



p > 0: 
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(ii) IfO<s< p, then for any R and R' in Hp, ifdH{R,R') < s, then RAR' c ^(dR, s) n e). 

Proof. Let xi, . . . , x„ be an e-packing of (0, 1)'', so D[i^B(Xi, s) covers (0, 1)^' and n < Cs for 
some constant C depending only on d. For any set RinKp, define 

I^(R) = {i=\,...,n : B(Xi, e) n ^ 0} . 

Then clearly, by definition of the covering, R c yJi^i^(R)B{xi, s), and 

yJieiM)B{Xi,s)<iR®B{2s). 

Therefore 

dH{^ieIM)^{Xi,S),R)<2s. 

Since when R ranges in Hp, the cardinality of sets of the form U,g4(/e)5(jc,, s) is bounded by 2", then 
the collection of Hausdorff" balls of radius 2e and centered set of the form U,g/5(x,, e), where / is 
any subset of {1, . . . , n}, covers 'Rp. By doubling the radius of the balls, we can take centers in 'Rp, 
which proves the first part of the lemma. 

The second part follows from the fact that if reach(5i?) > p, then dR ® B{p) = 'V(dR,p), 
assuming, without loss of generality, that dR has no boundary. □ 

We mention that the bound on the e-entropy of 'Rp is rather weak. Standard results by Kol- 
mogorov and Tikhomirov ll25l suggest a bound of the form Cipsy^'^'^^^^. Such a result would 
change the exponent for r„ in Theorem[2]to {3d + l)/2. 

3.2 Perimeter bounds of a regular set 

The classical isoperimetric inequality provides a bound of the volume of a Borel set R in terms of 
its perimeter (see e.g., Evans and Gariepy, 1992): 

dco'/Yoh(Ry'"' < Vol,-i(5i?). (3.1) 

But, in the case where dR has positive reach, the perimeter may in turn be bounded by the volume, 
as stated in Lemma [141 below. The proof uses the following inequality: for every Borel sets R, S 

Yoh-i (d{R U 5 )) + Volrf_i {diR n 5 )) < Yoh.ddR) + Vol,^i(55 ). (3.2) 

Lemma 14. Let Rbe a bounded open subset ofW' with reach(5i?) = p > 0. Then, 

Yoh.i(dR)<dYoh(R)/p. 

Proof. Since reach(57?) = p > 0, a ball of radius p rolls freely in R. Consequently R can be written 
as a countable union of balls of radius p, i.e., 

oo 

R = [jB(Xi,p). 

i=l 
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SetR,, = U'l^Bi where 5,- = B(xi,p). 

Using the decomposition R„+i = R„ U 5„+i, on the one hand we have 

Y0h(Rn+l) = yoh(Rn U 5„+i) = YohiRn) + COdp' - NoURn n Bn^l), 

and on the other hand, using inequality (13.21) . we have 

Volrf_i(57?„^i) = Noh^mRn U < Vol^_i(5i?„) + doj,p'-' - Noh^mK n B,,^,)). 

Consequently 

Volrf_i(5i?„+i) - - NoURn^,) < Noh^.idRn) - - NoURn) 
P P 

+ - YohiRn n - Yoh^i(d(Rn n 

But, using the isoperimetric inequality (|3.1I) . we may write 
- Volrf(i?„ n - Vol,,_i (5(7?„ n 5„+i)) 

p 

< - NoURn n Bn^l) - d0)Y\W0URn H 5„+i)j 

l-l/^rj 



<(voUi?„n5„+i)) 



<0 



since, in the last bracket, Volj(i?„ n < Vold(5„+i) = oOdP'^ ■ Therefore, for all n > 1, we have 

V0l,_i(5i?„^l) - - YoURn^l) < Yoh^.mn) - - YoURn). 

P P 

But since R\ is a ball of radius p, we have Volrf_i(57?i) - dWo\d{R\) I p = and so 



Volrf-i - - Volrf(7?„) < for all n > 1 . 
P 

Since i?„ converges to i? in L\ it follows from the lower semi-continuity of the perimeter, see 
e.g. [|23l Prop. 2.3.6], that liminf,, No\d-\{dRn) > Yoh-iidR). This concludes the proof. □ 



3.3 Exponential inequalities 

We prove the uniform versions of Propositions [TTI and [T2l for the class 'Rp. 

Proposition 15. There exists a constant C depending only on M such that, for any £,r > and all 
n satisfying nr^p'^ef^'^^ > C and s > Cr, we have 



sup \iu„{R)-fi{R)\>s 



< 2 exp - 



nr^e^ 



C{\+s) 



(3.3) 
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Proof. The bias term is dealt exactly as in Proposition [TTl obtaining 

|E[u„(^)]-M^)| <Cor, 
valid for all R e "Rp, so assuming s > 2Cor, we may focus on bounding the variance term 

IXn(R)-E\jXn(R)]. 

Define the kernel class 

r = : R e Kp}, (3.4) 
where (f>R^r is defined in (|2.1I) . Let U„(^) be the U-process over defined by 

U„((f>) = —^ycp(Xi,Xj). 
n(n -1)4—' 

Observe that 

sup \fi„{R) - E [Mn(R)]\ = sup |f/„(0) - ^f\(P)\ . 

Consider a minimal covering of "Rp of cardinal K by balls centered at elements Ri, . . . ,Rk of R.p, 
and of radius t] < p for the Hausdorff distance. By Lemma [T31 

logW<Ci(l/77)^'. 

For any R in 'Rp, there exists I < k < K such that dniR, Rk) ^ which implies that RAR^ c 
'V{dRk, rf). Also, by Lemma[8l there exists a constant C2 depending only on the dimension d such 
that Woldi'VidRk, rf)) ^ C2/7/P, for all 1 < A: < i^, which implies that 

// CVidRk, n)) < Csn/P, for all 1 < < i^, 

since t] < p, and where C3 now depends on M. 
We have 

< \ {U^RM) + 1rar,(j)) 1 {lk - yll < r) . 

Next, consider the inequality 

For the double expectations, we have, 

= E[UAR,(Xi)l{m-X2\\<r}] 
= [ iu{Bix,r))fx(dx) 

jRARk 

< I iu(B(x,r))iu(dx) 

J'V{dRk,T,) 

< ^,umdRk,ri)) 

< Cyr]/p, 
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with C4 still depending only on M. The last inequality is a consequence of Lemmas [8] and [141 and 
the fact that Vol^(i?<.) < 1 since Rk c (0, l)"*. 
For the empirical averages, we have 

\Un(M - U„i(f>R,,,)\ < I / V (iRAR^iXi) + lRAR,iXj)) 1 - X j\\ < A 

Therefore, 



SUp\Un(4>R,r) - fl^^((pR,r)\ < HiaX t/„ (^^^(SR,,;?)) + C4— + maX \Uni(t>Rt,r) - l^^^i^R,,r)\ ■ 

Consequently, for any e > 0, we may write 

Pfsup \M-^[Mn(R)]\>S 

\ReKp ) 

^sup|f/„(0)-/.n0)| 



\<peT "^M 



< p(max Un > - C4— ) + pfmax |f/„(0R,,,) - ;u®2(0^^ ,)| > 

yi<*:<A' ^ ' P / \\<k<K' 



2t 



M 



2r 



< Z max p|[/„(0^(aR,,,)) > - C4^j + max p||t/„(0«,,,) -^i^^^^^^^^^j > 

by the union bound. To bound the first term, note first that 

Var(0^(5«,,,)(Xi,X2)) < e[0^(5«,,,)(Zi,Z2)2] < E [0^(5«,,,)(Xi,X2)] , 

with ^ ^ 

e[0^(M,,,)(^i,^2)1 < — jum^T?,,//)) < C4— , 

for the same reasons as above. Now take 77 = p mm(a)ds/(SC4TM), !)• Then, for any 1 < < ^, by 
Hoedff"ding's inequality for U-statistics (Theorem |4l), we have, 

^""^i 5iC4r''r]/p) + 3ico,,r''s/4TM) 



< exp 
18 



for a constant C5 > depending only on M. To bound the second term, since 



Var(0«,,,(Xi,X2)) < E[0«,.,,ai,X2)] < oj^ /tm, 
we may apply Lemma |4] again to obtain the bound 



Un{(l>R,A-^i^\<l>R,,)\> 



2t 



M 



< exp I - 

< exp 



5<jL)dr^ + 3(a>dr''s/2TM) 



C6(l +£)/' 



for a constant > depending only on M. 

With the choice of rj as above, the cardinal K of the covering is such that \og{K) < C-i{epy' , 
for some constant C7 depending only on M, and we obtain the bound 



sup \nn{R)-B]jin{R)\\>e\ 

KReKp ) 



< ^ exp - 



cT 



+ K exp - 



C6(l + s) 



<2exp\C^{£py'' - 



Q(l+e) 



< 2 exp 



C9(l +e)/' 



if nr'^e'^^^p'^ > Cg, for a constant C9 depending only on M. 



□ 



For the perimeter, we only control the variance, as the bias may not be controlled uniformly 
over Tip. Indeed, consider the case where M is a hypercube with rounded corners so as to satisfy 
the condition on its reach, and let R be another hypercube with rounded comers included in M 
sharing one of its faces with M. Then given a sample Xi, . . ., X„, it is possible to translate R inside 
M just enough that the translate does not share a boundary with M, while its discrete volume and 
perimeter are left equal to those of R. 

Proposition 16. There exists a constant C depending only on M such that, for any s > 0, p < 1, 
r < min(reach(M),p/2) and all n satisfying nr^'^^p'''^^e'^'^^ > C, we have 



sup |v„(i?)-E[v„(i?)]| 



< 2 exp 



nr^^^pe^ 
'C(l +pe) 



Proof. The proof follows that of Proposition [151 with the symmetric kernel (pR y defined in (12.21 ) 
and the class T defined in (13.41) with (pR ,- replaced by 0^ ,-. Observe that 



\Vn{R) - E [y„(i?)]| = sup \Un{(P) - n^\(P)\ 
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As in the proof of Proposition [151 we start with a minimal covering of Hp of cardinal K by balls 
of radius rj for the Hausdorff distance. For any R in Hp at a Hausdorff distance no more than rj of 
an element of the covering, we have 

Hence, 

and therefore, following the same arguments, 
for a constant Ci depending only on M; and also, 

\Un i^R,r) - U„ < 2Un {(p'V(dRk,n)) ■ 

Hence 

Pfsup \Vn{R)-^{VniR)]\>S 



+ max n\Un{^R,,r)-lf\^R,,r)\ > ^^'"^ ^ 



Take 77 = pmin(7rfre/(4CiTM), !)• For the first term, for any 1 < < ^, we have 

- ^""^X 5(Cir''rj/p) + 3(yy^'s/STM)l 
^''Pr C2(l+e)j' 

for some constant C2 > depending only on M. For the second term, since by Lemma [TOl when 
r < p/2, 

Var(0«,,,)<C3/^Vp, 
for a constant C3 depending only on M, we have 

«(0«,,r) - > -1: < exp -n — 2^—- — - 

2tm I \ SCCsr^'+Vp) + 3(y,/r"+ie/2TM)/ 
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for a constant C4 > depending only on M. Finally, with the choice of 77 as above, the cardinal K 
of the covering is such that log(^) < Cs{rpe)~'\ for C5 depending only on M. 
Then 

ifnr2''+yV+2 >C6, and 

if fip-d+\ pd+\ ^ where Q and C7 depend on M only. Combining these inequalities, we 
conclude. □ 



3.4 A uniform control on /z„(A) 

As we argued earlier, the boundary of M makes a uniform convergence of the perimeters of sets in 
'Rn impossible. Our way around that is to compare the discrete perimeter of a set R with its perime- 
ter inside M,.,_, thus avoiding the boundary of M, i.e., Volj_i(57? n M,.J, leading to a comparison 
between hniR) and h(R; M^J. We relate the latter to h(R; M) in Section[331 

Lemma 17. Under the conditions of Theorem^ with probability one, we have: 

lim inf inf (/z„(i?) - h{R; Mr J) > 0. (3.5) 

;i-»oo ReK,, " 

Proof. Take R &'Rn and define 

AniR) = mm{jUn{R),MniR')), KiR) = — min(Vol,/(i? n MJ, YoUR' n M,-„)), 

Tm 

as well as 

v:(R) = —Yoh.,(dRnMJ. 

Then 

hM)-h(R;MJ = ^(v„(i?) - v:W) + 7-|^(A:W - 4W) 

=: am + urn. 

Define the event 

|1 ^ 3 I 

\2 A:(R) 2' "J 

We will see that P [Q„] 1 . 
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Bounding ^niR)- By definition of the sets R and R'^ contain each a ball of radius p„, and by 
Lemma [6l the volume of the intersection of this ball with M^^ is bounded from below by Cipf , for 
a constant Ci depending only on M. Hence, 



(3.6) 



Also, on Q„, AniR) > /l*(i?)/2. These last two inequalities being valid for all R e %j, for e > we 
have 



/i : = 



■^URX-e 



n Q„ 



< 



< 



mUvAR)-v:{R))<-C2epi 



inf {vn(R) - B[Vn(R)]) + inf (E[y„(i?)] - v:(^)) < -Cjspt 



for a constant C2 = Ci/2 > 0. Using the bias bounds of Lemma [TOl together with the perimeter 
bound in Lemma [T^l ii). we have 



Hence, since r„ = o(p") for any a > 0, for s fixed and n large enough, we have by assumption, for 
all n large enough. 



inf (VniR) - E[vnm) < -C2spi/2 



Ren, 



< 



sup \Vn(.R)-E[Vnm\>C2epi/2 



where the second inequality comes from the fact that c 'Rp^^. By the fact that nrff^^p" — > 00 for 
any or > 0, the conditions of Proposition [T6l are satisfied, so that 



/i<C4exp - " ^" 

\ C4(l+e) / 

for some constant C4 > and all n large enough. At last, we have 

" ^" = nr^''"-'^" " ^ +00, 
log(n) log(n) 

since r„ = o{p") for any a > and r„ — > polynomially in n, we deduce that, for all e > 0, 



inf 4(7?) < -s 

ReK 



n Q„ 



< 00. 



(3.7) 



Bounding ^„(i?). (We reset the constants, except for Ci .) By the perimeter bound of Lemma [141 
we have 

V„(^) < : < d = Cj/pn, 



TmPu 
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for a constant Cj > depending only on M. So, together with (13.61) and the fact that, on Q„, 
An(R) > A:,(R)/2, 



URmR) 



for all i? in K- It follows that 
h : = 



inf UR) < -e 



n Qn 



„2rf+l„ 



<p|sup|i„(i?)-i:(i?)l> 



(3.8) 



Define 



Then 



with id(M'^^ ) < C4r„ by (12.41 ). For s > fixed and n large enough, 2C4r„ < p^^^^^sjC^, again by the 
fact that pn — > sub-polynomially in r„. We therefore obtain that 



h < 21 



sup \^„{R)-^i{R)\>'^Jl—- 



where we used the fact that i?' e when i? e together with c "Rp^^. We then apply 
Proposition [T5l whose conditions are satisfied for s > fixed and n large enough, again because 
p„ — > very slowly, arriving at 

/2<C4exp 

\ C4(l+e) 

for some constant C4 > and all n large enough. As before, when e is fixed, the exponent is a 
positive power of n, so that 



inf < -s 

ReK„ 



n Q„ 



< 00. 



(3.9) 



Bounding P [Q.'^^]. Since A*^(R) > Cpf^ for some C uniformly over e "R,, (see (|3^ above), we 
have 

P(Q-) = p(supl-^"^^^-^"^^^l>l 



< 



snp\An(R)-A:(R)\>Cpi 
We then proceed as in bounding (13.81 ). obtaining 



(3.10) 
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Conclusion. We have 



inf {hn{R)-h{R-Mr,))<-2e 



< 



inf cm < 

ReK„ 



n Q„ 



+P 



inf ^JR) < -s 

ReK 



n Q„ 



+P m 



so that the left-hand side is summable. Therefore, we conclude by applying the Borel-Cantelli 
lemma. □ 



3.5 Some continuity of the Cheeger constant 

Our proof of Theorem |2] relies on continuity properties of the normalized cut and of the Cheeger 
constant. Lemma [T8] below compares the conductance function on M and on a bi-Lipschitz defor- 
mation of M. For a Lipschitz map /, let \\f\\up denote its Lipschitz constant. If / is bi-Lipschitz, 
we define its condition number by cond(/) := ||/||Lip ll/'^llup- Lemma [T9l below states that M,- is a 
bi-Lipschitz deformation of M, hence Lemma [T8] yields the continuity property of Proposition [201 

Lemma 18. Let f be a bi-Lipschitz on M. Then for any A <z M measurable, 



max 



{ , > < cond(f) 

\ h{A-M) ' h{f{A)-f{M))] 



Proof. For any A c M, df{A) = f{dA) and /(A)' n f{M) = f(A' n M), and if A is measurable, for 
k — 1 , . . . , d ^ 

-i^n-k \T^^ ^ A\ ^ \7Ai /■ fi' AW ^ w f\\k 



Therefore, 



11/ llup Vol,(A) < VoU(/(A)) < Vol,(A). 

Vol^_i(/(5AnM)) 



h{f{A)-f{M)) = 



min{VolX/(A)), Vol^(/(A'- n M))} 



„Lip Vol,,„i(5AnM) 
- ll^fp min{Vol,KA), WoUM n M)} 
< cond(// /z(A;M). 

And vice-versa. □ 

Lemma 19. Fix r < s < reach(5A/). Then there is a bi-Lipschitz map between Mr and M that 
leaves unchanged, and with condition number at most (1 -I- 2r/(s - r))^. 

Proof. For jc in M such that 6{x) : = dist(jc, dM) < s, let ^(x) e M be its metric projection onto dM 
and Ux be the unit normal vector of M at ^(x) pointing outwards. We define the map 



fr-.Mr^ M, fr(x) = X+ ^''''^^^ M, 

s - r 



where a+ denotes the positive part ofaeR. By construction, / is one-to-one, with inverse 

: M ^ M,-, f;\x) = X - 
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By II20I Thm. 4.8(1)], 6 is Lipschitz with constant at most 1, therefore so is jc i-> (5 - 6(x))^; and 
since the reach bounds the radius of curvature from below ||20l Thm. 4.18], x 1-^ is Lipschitz 
with constant at most 1 / reach(5M). Therefore, using the fact that (s - 6{x))+ < s and = 1, /,- 
and /,r' are Lipschitz with constants at most 1 + 2r/{s - r) and 1 + Irj s respectively. □ 

Proposition 20. We have 

H{Mr) = (1 + 0{r)) H{M), r ^ 0. 
Proof. From Lemmas [T8] and [T9l we deduce that 



max 



{ , ^ < (1 + 2r (pm - n) , 



for any r < pm '■ = reach(i?M), which immediately yields the desired result. □ 



3.6 -metric on Borel sets 

We will use the L'-metric on Borel subsets of R"^', defined by Volrf(AA5) = J|1a(x) - \b{x)\ dx. 
This metric comes from the bijection between Borel sets A and their indicator functions 1^, en- 
dowed with the -topology. Strictly speaking, this is a semi-metric on Borel subsets of since 
Volj(AA5) = if and only if AA5 is a null set. 

The following propositions are adapted from Il23l Thm. 2.3.10] and ESI Prop. 2.3.6] respec- 
tively. Proposition [2T| is a compactness criterion, and Proposition [22] results from lower semi- 
continuity of the perimeter measure with respect to L' -metric. 

Proposition 21. Let be a sequence of measurable subsets ofM. Suppose that 

lim sup Void- 1(5£'„ n M) < 00. 

Then (£„) admits a subsequence converging for the -metric. 

Proposition 22. Let E„ and E be bounded measurable subsets ofM such that E„ ^ E in L\ and 
h(E; M) < 00. Then 

hminf h(E„;M) > h{E\M). 

n 

3.7 Proof of (0 in Theorem IH 

Lower bound. For each n, let i?„ e be such that 

hl{R„) = mmhl{R). 

ReK 

Then 

hl(R„) - H(M) = [hl(R„) - h(R,; Mj] + [h(R„; M,„) - H{MJ] + [H(MJ - H(M)] 
> inf ih„(R) - h{R- Mr,)) + [H{Mr„) - H{M)\ , 
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since [/?(^„; M^J - H(M,-J] > by definition of H(M,-J. On the last line, by Lemma [TTl the first 
term has a non-negative inferior limit, and by Proposition [201 the second term tends to zero. Hence, 

\immfmmhl(R)>H{M) a.s. (3.11) 

Upper bound. To obtain the matching upper bound, fix a subset A c M with smooth relative 
boundary and such that < Volrf(A) < Yo\d{M\A) < Volj(M). Then, for n large enough, there 
exists Rn in such that R„ Ci M = A, implying that 

minhliR) < /z„(A). 

ReK 

By Theorem [U /z„(A) — > h(A; M) almost surely, so that 

Hm sup min hl{R) < h(A; M) a.s. 

By minimizing over A, we obtain 

limsupmin/z*(i?) < //(M) a.s. (3.12) 

n — >oo '^^^n 

Combining the lower and upper bounds, (13.1 II) and (13.121) . we conclude that 

limmin/z*(i?) = //(M) a.s. (3.13) 

H^oo ReK,, 

3.8 Proof of (//) in Theorem |2] 

Let R„ be a sequence in %^ satisfying 

hl{Rn) = mmhl{R), 

ReK 

and set A„ = R„ n M. Fix a subset A° c M with smooth relative boundary and such that h{A^) < oo. 
Then for n large enough, there exists R in such that A^ = R D M. Hence /?„(A„) < /?„(A°) and 
since /j„(A") — > /z(A") by Theorem[Tl we have 

limsupVol,,-i(A„) < limsup/z(A„)min{VoUA„), Volrf(A^ n M)} < /z(A°) VoUM)/2. 

Therefore by compactness of the class of sets with bounded perimeters (Proposition |2TI). with prob- 
ability one, {An} admits a subsequence converging in the -metric. 

On the one hand, 

hiA„; MJ - H(M) = [h(An; MJ - H(MJ] + [H(MJ - H(M)] , 
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where the first difference term on the right-hand side is non-negative by definition, while the second 
difference term tends to zero by Proposition [201 So that with probability one: 



liminf/i(A„;M,,,) > H(M). 



On the other hand, 



h{A„;MJ-H(M) = [h(A,;MJ-hl(A„)] + [hl{A„)-H(M)] 

< - inf (hliR) - h(R; M,,)) + [/z*(A„) - //(M)] 



so that 



lim sup h(An; M^J - H(M) < 



n- 



■CO 



lim inf inf (hl{R) - h(R; Mj) + [/i*(A„) - H(M)\ 



which goes to as n ^ oo from (13.51) and (I3.13I ). Hence 



lim h{An,M J ^ H{M) a.s. 



Now let fn denote the bi-Lipschitz function mapping M,„ to M defined in Lemma[l9]with r and 
s replaced by r„ and 5„, where 5„/r„ oo. Define B„ = f„{An n MrJ. By Lemmas [T8] and [T9l we 
have 



so that h(Bn; M) — > H{M) almost surely as n — > oo. Moreover, by Proposition [211 with probability 
one, there exists a subset B^o of M and a subsequence such that 5,,^ converges to Boo in 
the -metric. Since h{-;M) is lower-semi-continuous by Proposition [221 with probability one, 
liminf„^oo ^(5„; M) > h{B oo\ M). Since we also have liminf„^oo h{Bn\ M) = H(M) a.s., it follows 
that h(Boo; M) = H{M) a.s. and so B^o is a Cheeger set of M. 
Moreover, since f,j leaves M^^ unchanged. 



Hence with probability one, 1^,, - Is,, ^ in Consequently, the sequences {A,,} and {5„} have 
the same accumulation points, and so any convergent subsequence of {A,,} converges to a Cheeger 
set of M. 



Let An = R„riM and assume, without loss of generality, that A„ Aoo in L^. For all n > 1, and all 
/ in the class of bounded and continuous functions on M, say Cb{M), we have 




Vol,(A„A5„) < Yoh(M\MJ ^0 as n ^ oo. 



3.9 Proof of Theorem [3] 




f{x)U,XxMdx) < sup |P„ ifW - fi (fW\ , 
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where is the empirical measure of the sample Xi, . . . ,X„. Using the bound on the covering 
numbers in Lemma [131 it is a classical exercise to prove that the collection of functions x i-> 
/(jc)ls(x) where R ranges over is a Glivenko-Cantelli class, whence 



Qnf - r f{x)\R,xx)^im 

Jm 



a.s. as n ^ oo. 



Next, 



Jm 



r f{x)\A„ 

Jm 



(x)/z(dx) - Qf 



< 



oyU (A„AAco) 



which tends to by definition of A^. Thus, we have shown that, for all / in Cb(.M), P (Qnf — > Qf) = 
1. Using the separability of Cb(M) |fT9l p. 131], we deduce that 

P[V/6a(M), QJ^Qf] = l, 

so that the event "Q„ converge weakly to Q" is of probability 1. 
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